智能论文笔记

A knowledge graph representation learning approach to predict novel kinase-substrate interactions

Sachin Gavali , Karen Ross , Chuming Chen , Julie Cowart , Cathy H. Wu

分类：人工智能

2022-06-05

人蛋白质组包含一个庞大的相互作用激酶和底物网络。即使某些激酶被证明是治疗靶标的非常有用的，但大多数仍在研究中。在这项工作中，我们提出了一种新颖的知识图表示方法，以预测研究研究的新型相互作用伙伴。我们的方法使用通过整合IPTMNET，蛋白质本体论，基因本体论和BIOKG的数据构建的磷蛋白知识图。通过在三元组上进行定向的随机步行，与修改后的Skipgram或CBOW模型一起进行定向的随机步行，从而学习了该知识图中激酶和底物的表示。然后，这些表示形式被用作监督分类模型的输入，以预测研究不细的激酶的新型相互作用。我们还提供了对预测相互作用的后预测分析和对磷酸蛋白质学知识图的消融研究，以了解对研究的激酶的生物学的见解。

translated by 谷歌翻译

Understanding the factors driving the opioid epidemic using machine learning

Sachin Gavali , Chuming Chen , Julie Cowart , Xi Peng , Shanshan Ding , Cathy Wu , Tammy Anderson

分类：机器学习

2021-08-16

近年来，美国经历了一个缺乏预定的药物过量死亡的阿片类药物。研究发现这种过量的死亡与邻域级特征有关，从而提供了识别有效干预的机会。通常，诸如普通的最小二乘（OLS）或最大似然估计（MLE）的技术用于记录邻域级因素，在解释这种不利结果时。然而，这些技术较低的是在混淆因素之间确定非线性关系。因此，在这项研究中，我们应用基于机器学习的技术，以识别特拉华州社区的阿片式风险，并探讨这些因素使用福芙添加剂解释（Shaf）的相关性。我们发现与社区环境有关的因素，随后受教育，然后犯罪，与较高的阿片类药物风险高度相关。多年来我们还探讨了这些相关性的变化，了解流行病的变化动态。此外，我们发现，随着近年来，由于疫情从法律（即，海洛因和芬太尼）药物从法律（即，海洛因和芬太尼）转移，与阿片类药风险的环境，犯罪和健康相关变量的相关性显着增加虽然经济和社会人口统计变量的相关性降低了。近年来，教育相关因素的相关性与近年来略有增加，表明需要提高对阿片类药物流行病的认识。

translated by 谷歌翻译

RangeAugment: Efficient Online Augmentation with Range Learning

Sachin Mehta , Saeid Naderiparizi , Fartash Faghri , Maxwell Horton , Lailin Chen , Ali Farhadi , Oncel Tuzel , Mohammad Rastegari

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-20

State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations. RangeAugment uses an auxiliary loss based on image similarity as a measure to control the range of magnitudes of augmentation operations. As a result, RangeAugment has a single scalar parameter for search, image similarity, which we simply optimize via linear search. RangeAugment integrates seamlessly with any model and learns model- and task-specific augmentation policies. With extensive experiments on the ImageNet dataset across different networks, we show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations. Experimental results on semantic segmentation, object detection, foundation models, and knowledge distillation further shows RangeAugment's effectiveness.

translated by 谷歌翻译

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

Tianxing He , Jingyu Zhang , Tianle Wang , Sachin Kumar , Kyunghyun Cho , James Glass , Yulia Tsvetkov

分类：自然语言处理

2022-12-20

In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data. Basically, we design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores. We examine a range of recently proposed evaluation metrics based on pretrained language models, for the tasks of open-ended generation, translation, and summarization. Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics. For example, we find that BERTScore ignores truncation errors in summarization, and MAUVE (built on top of GPT-2) is insensitive to errors at the beginning of generations. Further, we investigate the reasons behind these blind spots and suggest practical workarounds for a more reliable evaluation of text generation.

translated by 谷歌翻译

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

Benjamin Muller , Deepanshu Gupta , Siddharth Patwardhan , Jean-Philippe Fauconnier , David Vandyke , Sachin Agarwal

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-04

Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.

translated by 谷歌翻译

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Sachin Goyal , Ananya Kumar , Sankalp Garg , Zico Kolter , Aditi Raghunathan

分类：计算机视觉 | 机器学习

2022-12-01

Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.

translated by 谷歌翻译

A Survey on the application of Data Science And Analytics in the field of Organised Sports

Sachin Kumar S , Prithvi HV , C Nandini

分类：机器学习

2022-09-15

在现代世界中，数据科学和分析以优化或预测结果的应用无处不在。数据科学和分析已经优化了市场中存在的几乎所有领域。在我们的调查中，我们专注于如何在体育领域采用分析领域，以及它如何促进游戏的转型，从评估现场玩家及其选择到赢得团队的预测以及大型体育比赛的门票和商业方面的营销。我们将介绍体育分析领域采用的不同运动的分析工具，算法和方法论，并介绍我们对同一体育的看法，我们还将比较和对比这些现有方法。通过这样做，我们还将介绍任何希望尝试体育数据并分析游戏的各个方面的人考虑的最佳工具，算法和分析方法。

translated by 谷歌翻译

Data Science Approach to predict the winning Fantasy Cricket Team Dream 11 Fantasy Sports

Sachin Kumar S , Prithvi HV , C Nandini

分类：机器学习

2022-09-15

数字技术的发展和体育运动的日益普及激发了创新者，通过引入幻想体育平台FSP，将体育倾向的用户带到一个全新的不同层次上。数据科学和分析的应用在现代世界中无处不在。数据科学和分析打开门，以获得更深入的理解和帮助，以帮助决策过程。我们坚信，我们可以采用数据科学来预测FSP上的获胜幻想板球团队，Dream 11.我们建立了一个预测模型，可以预测潜在游戏中玩家的性能。我们结合了贪婪和背包算法的组合，开出了11名球员的组合，创建了一支幻想板球团队，这是最重要的统计赔率，即最大的团队成为最强的团队，从而使我们有更大的机会赢得梦想中的赌注。 11 FSP。我们使用Pycaret Python库来帮助我们理解并采用最佳回归算法来进行问题陈述，以做出精确的预测。此外，我们使用Plotly Python图书馆为我们提供了对团队的视觉见解，并且玩家通过计算前瞻性游戏的统计和主观因素来表演。交互作用图帮助我们提高了我们的预测模型的建议。您要么赢得大，赢得小巧，要么根据预期游戏中为您的幻想团队选出的球员的表现而失去赌注，而我们的模型增加了您赢得大的可能性。

translated by 谷歌翻译

On CAD Informed Adaptive Robotic Assembly

Yotto Koga , Heather Kerrick , Sachin Chitta

分类：机器人

2022-08-02

我们介绍了一个机器人组装系统，该系统简化了从产品组件的CAD模型到完整编程和自适应组装过程的设计对制造工作流程。我们的系统（在CAD工具中）捕获了特定机器人工作电脑组装过程的意图，并生成了任务级指令的配方。通过将视觉传感与深度学习的感知模型相结合，机器人推断出从生成的配方中组装设计的必要动作。感知模型是直接从模拟训练的，从而使系统可以根据CAD信息识别各个部分。我们用两个机器人的工作栏演示了系统，以组装互锁的3D零件设计。我们首先在模拟中构建和调整组装过程，并验证生成的食谱。最后，真正的机器人工作电池使用相同的行为组装了设计。

translated by 谷歌翻译

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Chien-Yu Lin , Anish Prabhu , Thomas Merth , Sachin Mehta , Anurag Ranjan , Maxwell Horton , Mohammad Rastegari

分类：计算机视觉

2022-07-21

最近的各向同性网络，例如Convmixer和Vision Transformers，在视觉识别任务中发现了巨大的成功，匹配或胜过非方向性卷积神经网络（CNNS）。各向同性架构特别适合跨层重量共享，这是一种有效的神经网络压缩技术。在本文中，我们对各向同性网络中共享参数的方法（SPIN）进行了经验评估。我们提出了一个框架，以形式化重量分享设计决策并对此设计空间进行全面的经验评估。在我们的实验结果的指导下，我们提出了一种重量共享策略，以与仅传统缩放方法相比，在拖放和参数与准确性方面，产生一个具有更好总体效率的模型家族，例如，将Convmixer压缩为1.9倍，同时提高准确性的准确性成像网。最后，我们进行定性研究，以进一步了解各向同性体系结构中的重量共享的行为。该代码可在https://github.com/apple/ml-pin上找到。

translated by 谷歌翻译